CBT Campus' Online Skills Training Courses.

IT Skills

Enterprise Database Systems

Data Science

DevOps for Data Scientists

it_dsdods_04_enus

it_dsdods_01_enus

it_dsdods_02_enus

it_dsdods_03_enus

DevOps for Data Scientists: Containers for Data Science

Course Number:
it_dsdods_04_enus

Expected Duration (hours)
1.0

Lesson Objectives

DevOps for Data Scientists: Containers for Data Science

discover the key concepts covered in this course
describe the use of containers for data science
describe approaches to infrastructure as code for data deployment
describe Ansible and Vagrant approaches to data science deployment
describe provisioning tools used in data science
use Docker to build a data model
use Docker to perform model testing for deployment
use Docker to manage R deployments
use Docker for a PostgreSQL deployment
create a Docker persistent volume
use Jupyter Docker Stacks to get up and running with Jupyter
use the Anaconda distribution to run a Jupyter Notebook
use Jupyter Notebooks with a Cookiecutter data science project
use Docker Compose with PostgreSQL and Jupyter Notebooks
use a container deployment for Jupyter Notebooks with R
use a container strategy for a Jupyter deployment

Overview/Description

In this 16-video course, explore the use of containers in deploying data science solutions by using Docker with R, Python, Jupyter, and Anaconda. Begin with an introduction to containers and their use for deployment and data science. Then examine approaches to infrastructure as code for data deployment, and concepts behind Ansible and Vagrant approaches to data science deployment. Explore the main features of provisioning tools used in data science. You will learn how to use Docker to build data models, then use it to perform model testing for deployment, to manage R deployments, and for a PostgreSQL deployment. Also, discover how to use Docker for persistent volumes. Next, learners look at using Jupyter Docker Stacks to get up and running with Jupyter and using the Anaconda Distribution to run a Jupyter Notebook. This leads into using Jupyter Notebooks with a Cookiecutter data science project. Then learn about using Docker Compose with PostgreSQL and Jupyter Notebook, and using a container deployment for Jupyter Notebooks with R. The concluding exercise involves deploying Jupyter.

Target

Prerequisites: none

DevOps for Data Scientists: Data DevOps Concepts

Course Number:
it_dsdods_01_enus

Expected Duration (hours)
0.8

Lesson Objectives

DevOps for Data Scientists: Data DevOps Concepts

discover the subject areas covered in this course
define the use and application of DevOps for data science and machine learning
describe topological considerations for data science and DevOps
apply high-level organizational and cultural strategies for data science with DevOps
describe the specific day-to-day tasks of DevOps for data science
assess technological risks and uncertainties when implementing DevOps for data science
describe scaling approaches to data science using DevOps
identify how DevOps can improve communication for data science workflows
identify how DevOps can help overcome ad hoc approaches to data science
describe considerations for ETL pipeline workflow improvements through DevOps
describe the microservice approach to machine learning
create a diagram of your data science infrastructure

Overview/Description

To carry out DevOps for data science, you need to extend the ideas of DevOps to be compatible with the processes of data science and machine learning (ML). In this 12-video course, learners explore the concepts behind integrating data and DevOps. Begin by looking at applications of DevOps for data science and ML. Then examine topological considerations for data science and DevOps. This leads into applying the high-level organizational and cultural strategies for data science with DevOps, and taking a look at day-to-day tasks of DevOps for data science. Examine the technological risks and uncertainties when implementing DevOps for data science and scaling approaches to data science in terms of DevOps computing elements. Learn how DevOps can improve communication for data science workflows and how it can also help overcome ad hoc approaches to data science. The considerations for ETL (Extract, Transform, and Load) pipeline workload improvements through DevOps and the microservice approach to ML are also covered. The exercise involves creating a diagram of data science infrastructure.

Target

Prerequisites: none

DevOps for Data Scientists: Data Science DevOps

Course Number:
it_dsdods_02_enus

Expected Duration (hours)
1.2

Lesson Objectives

DevOps for Data Scientists: Data Science DevOps

discover the subject areas covered in this course
examine a Cookiecutter project structure
modify a Cookiecutter project to train and test a model
describe the steps in the data model life cycle
describe the benefits of version control for data science
describe tools and approaches to continuous integration for data models
describe approaches to data and model security for Data DevOps
describe approaches to automated model testing for Data DevOps
identify Data DevOps considerations for data science tools and IDEs
identify approaches to monitoring data models
describe approaches to logging for data models
identify ways to measure model performance in production
add directives to the make file to prepare for continuous integration
implement a data integration task with Jenkins
implement data integration with Travis CI
incorporate a model into a Cookiecutter project

Overview/Description

In this 16-video course, learners discover the steps involved in applying DevOps to data science, including integration, packings, deployment, monitoring, and logging. You will begin by learning how to install a Cookiecutter project for data science, then look at its structure, and discover how to modify a Cookiecutter project to train and test a model. Examine the steps in the data model lifecycle and the benefits of version control for data science. Explore the tools and approaches to continuous integration for data models, to data and model security for Data DevOps, and the approaches to automated model testing for Data DevOps. Learn about the Data DevOps considerations for data science tools and IDEs (integrated developer environment) and the approaches to monitoring data models and logging for data models. You will examine ways to measure model performance in production and look at data integration with Cookiecutter. Then learn how to implement a data integration task with both Jenkins and Travis CI (continuous integration). The concluding exercise involves implementing a Cookiecutter project.

Target

Prerequisites: none

DevOps for Data Scientists: Deploying Data DevOps

Course Number:
it_dsdods_03_enus

Expected Duration (hours)
0.6

Lesson Objectives

DevOps for Data Scientists: Deploying Data DevOps

Course Overview
serialize models using Python and pickle
describe tools and approaches to model packaging and deployment
describe the blue-green deployment strategy for Data DevOps
describe the canary deployment strategy for Data DevOps
describe approaches to rolling back model versions
explore approaches to deploying models to web APIs
use python and pandas to serialize a model

Overview/Description

In this course, learners will explore deploying data models into production through serialization, packaging, deployment, and rollback. You will begin by watching how to serialize models using Python and Pandas. Then the 8-video course takes a look at the tools and approaches to model packaging and deployment. Next, you will explore the concept of the blue-green deployment strategy for data DevOps, which is the strategy for upgrading running software. This leads into examining the concepts behind the Canary deployment strategy in terms of data DevOps. Canary deployments can be regarded as a phase or test rollout on updates and new features. Then take a look at versioning and approaches to rolling back models for machine learning with DevOps. Finally, you will learn about some of the considerations for deploying models to web APIs (application programming interfaces). The concluding exercise involves creating a model by using Python and Pandas, then serializing the results of the model to a file.

Target

Prerequisites: none